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Abstract. In this note I will review some of the recent results that have 
been obtained in the probabilistic approach to the random satisfiability 
problem. At the present moment the results are only heuristic. In the 
case of the random 3-satisfiability problem a phase transition from the 
satisfiable to the unsatisfiable phase is found at q = 4.267. There are 
other values of a that separates different regimes and they will be de- 
scribed in details. In this context the properties of the survey decimation 
algorithm will also be discussed. 



1 Introduction 

Recently many progresses jll2j have been done on the analytic and numerical 
study of the random K-satisfiabihty problem |3l4l5l5j . using the approach of 
survey-propagation that generalizes the more old approach based on the belief- 
propagation algorithm ^ |9l7ll0llf] . Similar results have also been obtained for 
the coloring a of random graph |12| . 

In the random K-sat problem there are iV variables a{i) that may be true 
of false (the index i will sometime called a node) . An instance of the problem is 
given by a set of M = aN clauses. In this note we will consider only the case 
if = 3. In this case each clause c is characterized by set of three nodes 
ig), that belong to the interval 1 — and by three Boolean variables {b\,h\, 5|, 
i.e. the signatures in the clause). In the random case the i and 6 variables are 
random with flat probability distribution. Each clause c is true if the expression 



= {G(i\)XORh\) OR (CT(i§)A:Oi?fe§) OR {a{i''^)XORh'i,) (1) 



is true ^. 



The belief propagation algorithm (sometimes called "Min-Sum") is the the zero 
temperature limit of the "Sum-Product" algorithm. In the statistical mechanics lan- 
guage (7] the belief propagation equations are the extension of the TAP equations 
for spin glasses 8 13, and the survey-propagation equations are the TAP equations 
generalized to the broken replica case. 

When all the h" are false E" = cr(ii) OR OR ^(is) while when all the are 

true E" = cr(i=) OR cr(i=) OR a(i^). 



The problem is satisfiable iff we can find a set of the variables a such that 
all the clauses are true (i.e. a legal configuration); in other words we must find a 
truth value assignment. The entropy pTIj of a satisfiable problem is the logarithm 
of the number of the different sets of the a variables that make all the clauses 
true, i.e the number of legal configurations. 

The goal of the analytic approach consists in finding for given a and for 
large values of N the probability that a random problem (i.e. a problem with 
random chosen clauses) is satisfiable. The — 1 law [4161 16| is supposed to be 
valid: for a < ac all random systems (with probability one when N goes to 
infinity) are satisfiable and their entropy is proportional to N with a constant 
of proportionality that does not depend on the problem. On the other hand, 
for a > ac no random system (with probability one) is satisfiable. An heuristic 
argument jl]i2) suggests that a^. = Oi* 4.267 where a* can be computed using 
the survey-propagation equations defined later. There is already a proof 17 that 
the value of a* computed with the techniques of survey-propagation is a rigorous 
upper bound to ac (the proof has been obtained only for even the extension 
to odd K is technically difficult). 

2 Results 

Generally speaking we are interested to know not only the number of legal con- 
figurations, but also the properties of the set of all legal configurations. At this 
end it is convenient to say that two configurations are adjacent if their Hamming 
distance is less than eiV, where e is a small number. 
We can argue that in the limit of large N: 

1. In the interval a<ad~ 3.86 the set of all legal configurations is connected, 
i.e. there is a path of mutually adjacent configurations that joins two con- 
figurations of the set. In this region the belief- propagation equations (to be 
define later) have an unique solution. 

2. In the interval ad < a < ac ~ 4.267 the set of all the legal configurations 
breaks in an large number of different disconnected regions that are called 
with many different names in the physical literature [14115) (states, valleys, 
clusters, lumps. . . ). Roughly speaking the set of all the legal configurations 
can be naturally decomposed into clusters of proximate configurations, while 
configurations belonging to different clusters (or regions) are not close. This 
phenomenon is called in spontaneous replica symmetry breaking in the phys- 
ical literature. The core of the approach of this note is the analysis of this 
phenomenon and of the methods used to tame its consequences ^ . The pre- 
cise definition of these regions is rather complex T8'; roughly speaking we 
could say that two legal configurations belongs to the same region if they are 

^ Other models, where this phenomenon is not present, like random bipartite matching 
can be analyzed in a much simple way, although 15 years have been needed from the 
statements of the main result (i.e. the length of the shortest matching in the infinite 
A*' limit is C(2)) to the rigorous proof of this fact. 



in some sense adjacent, i.e. they belongs to a different region if their Ham- 
ming distance is greater than eN . In this way the precise definition of these 
regions depends on e, however it can be argued that there is an interval in 
e where the definition is non-trivial and is independent from the value of e: 
for a rigorous definition of these regions see |20I21I19] . The number of these 
regions is given by exp(Z'^(a)), where {a) is the total complexity; for 
large N the total complexity is asymptotically given by [a) = NS{a) 
where S{a) is the complexity density. In this interval the belief-propagation 
equations have many solutions and each of these solution is associated to 
a different cluster. The statistical properties of the set of the solutions of 
the belief-propagation equations can be studied using the belief-propagation 
equations (to be defined later). 
3. Only in the interval ah ~ 3.92 < a < Uc there are literals a that are frozen, 
i.e. they take the same value in all the legal configurations of a region ^. 
We could say that the frozen variables form the backbone of a region. It is 
important to realize that a given clause may simultaneously belong to the 
backbone of one region and not belong to the backbone of an other region. 

The arguments run as follow. Let us start with a given instance of the prob- 
lem. We first write the belief propagation equations. For each clause that contains 
the node i (we will use the notation c G i although it may be not the most ap- 
propriate) pT{i,c) is defined to be the probability that the variable a{i) would 
be true in absence of the clause c when we average over the set of all the legal 
configuration {pp{i,c) = 1 —pT{i,c) is the probability to be false). If the node 
il were contained in only one clause, we would have that 



priil) = UTiPT{-i2,c),PT{i3,c),b1,b^,b'=^) = UT{ii,c) , 

ppiii) = l-UTiii,c) , (2) 

where ut is an appropriate function that is defined by the previous relation. An 
easy computation shows that when all the b are false, the variable <7{ii) must be 
true if both variable 17(12) ^^i^ "'(^3) false, otherwise it can have any value. 
Therefore we have in this case that 

'^T{i, c) = TT^^-T — 1—^ . (3) 

2-pF[l2,C)PF[l3,c) 

In a similar way, if some of the b variable are true, we should exchange the indices 
T and F for the corresponding variables, i.e., if bl is true, then ut{ui) becomes 
uf{ui). Finally we have that 

Pt[%,c) = 

Zo(i,c) 



* The distinction between a and ab Q is not usually done in the literature and 
sometimes it is wrongly assumed that ab = O-d- 



Zo{i,c) 

Mhc)^ Yl UT{i,d)+ Yl UF{i,d). (4) 

d^i,d^c d^i,d^c 

We note the previous formulae can be written in a more compact way if we 
introduce a two dimensional vector p, with components pt and pp- We define 
the product of these vector 

ct = a-T bx cp = ciF bp, (5) 

if c = a ■ 6. 

If the norm of a vector is defined by 

\a\ = ar + aF , (6) 
the belief propagation equations are defined to be 

\Udei,d^M^'d)\ 

In total there are 3M variables prii, c) and 3M equations. These equations 
in the limit of large N should have an unique solution in the interval a < a^i 
and the solution should give the correct values for the probabilities pT{i,c). In 
this interval the entropy (apart corrections that are subleading when N goes to 
infinity) is given by 

S = ~Y1 log(^iW) + 2 E log(^2(c)) . (8) 

i=l,N c=l,M 

Here the first sum runs over the nodes and the second one runs over the clauses; 
Z2{c) is the probability that the clause c would be satisfied in a system where 
the validity of the clause c is not imposed. One finds that in the case where all 
the b variables are false 

Z2{C) = 1 - PF{il)pF{i'2)pF{i'2) . (9) 

In a similar way Zi (i) is the probability that all we can find a legal configuration 
containing the site i starting from a configuration where the site i is not present 
and it is given by: 

Zi{i) = \llu{i,d)\ (10) 

d£i 

The belief propagation equations can also be written as : 

dS , , 

9p(«,c) 



The belief-propagation equations can be formally derived by a local analysis by 
assuming that, in the set of the legal configurations of the system where all the 



clauses c £ « are removed, the variables cr(fc) that would enter in these clauses 
are not correlated. This cannot is not true for finite N, but this statement may 
be correct in the limit N oo with the appropriate qualifications. 

Generally speaking for a given sample these equations may not have an exact 
solution, but they do have quasi-solutions (i.e. approximate solutions, where 
the approximation becomes better and better when N goes to infinity ^): these 
equations have been derived using a local analysis that is correct only in the 
limit iV — !■ oo. 

In the interval ad < a < ab the variables pp and pT are different from and 
1; however in the region at < a < ac there solutions (or quasi-solutions) of the 
belief equations have a fraction of the variables pp and pt that are equal to 
or 1. 

When the number of solutions of the belief propagation equations is large, 
the properties of the sets of solutions of the belief propagation equations can be 
obtained by computing the solution of the survey propagation equations defined 
as follows. In the general approach in each node we introduce the probability 
(Vi^cip)) to find a solution of the belief-propagation equations with prih c) = p. 
With some effort we can write down local equations for this probability. These 
are the full survey equations that allow the computation of the total entropy. 

This approach is computationally heavy. As far as the computation of the 
complexity is concerned, we can use a simpler approach, where we keep only a 
small part of the information contained in the function Vi^dp), i.e. the weight 
of the two delta function a,t p — and p = 1. More precisely we introduce the 
quantity srih c) that is defined as the probability of finding prii, c) = 1, in the 
same way spihc) is the probability of finding pT{i,c) = and si{i,c) is the 
probability of finding < pt(*, c) < 1. It is remarkable that it is possible to 
write closed equations also for these probabilities (these equations are usually 
called the survey propagation equations 1 ). 

We can use a more compact notation by introducing a three dimensional 
vector s given by 

s = {st,si,sf} . (12) 

Everything works as before with the only difference that we have a three com- 
ponent vector instead of a two component vector. Generalizing the previous 
arguments one can introduce the quantity u{i,c) that is the value that the sur- 
vey at i would take if only the clause c would be present in i (in other words 
u{i,c) is the message that arrives to the site i coming from the clause c). In the 
case where all the b are false, a simple computation gives 

u{hc) ^ {sF{i2,c)sF{i^,c), 1 - SF{i2,c)sF{i'^,c), 0}. (13) 

^ More precisely if we have equations Ei[a] — for i = 1,N a solution a of 
this system of equation satisfies the condition A^^SMmi=i,jv)(-E'4''"l)^ = 0; a quasi- 
solution satisfies the weaker condition A'^^sumi=i,jv)(-E4''"])'^ < h{N), where h{N) 
is a function that goes to zero when A'^ goes to infinity. The definition of a quasi- 
solution depends on the properties of the function h{N) and this point must be 
further investigated: it may turn out at the end that quasi-solutions are not needed. 



The formula can generalized as before ^ to the case of different values of b. One 
finally finds the survey propagation equations: 



[Le...,.,»(..^ (14) 

where we have defined product in such a way that 

ab = {ot&t + aibx + axbi, ajbi, ap bp + ai bp + ap bj}. (15) 

It is convenient to introduce the reduced complexity {Uji{a)), that counts 
the number of solutions of the belief equations where two different solutions are 
considered equals if they coincide in the points where the beliefs are or 1 ^. 
In other words two solutions of the beliefs equations with an identical backbone 
enters only once in the counting that leads to the reduced complexity. 

If there is an unique solution to the survey propagation equations, it is pos- 
sible to argue that the reduced total complexity should be given by 

Sr = -Y1 HZid)) + 2 Yl HZ2{c)) (16) 

i=l,N c=l,M 

where now the definition of the Z^s is changed and it is done using the surveys, 
not the beliefs: 

Z,{i) = ln{\l[u{i,d)\), Z2{c) = \n{\s{i,c)u{i,c)\) (17) 

The reduced complexity I^R^a) it is particularly interesting because it hat been 
conjecture that it should vanishes at the critical point ac- This allow the com- 
putation of the point ac- 

It is interesting that also in this case the survey propagation equations can 
be written in a simple form: 

One finally finds that the survey-propagation equations do not have an unique 
solution when a > ajj « 4.36. The fact is not of direct importance because 
au > ac- Indeed in the region a > ac the complexity is negative so that the there 
are no solutions of the belief-propagation equations associated to the solution of 
the survey-propagation equation. 

It is evident that for a > ac there are no more legal configurations and 
SR{a) is not well defined. A negative value SR{a) can be interpreted by saying 

® It always happens that the vector u has only one zero component [utUf = 0). This 

fact may be used to further simplify the analysis. 
^ It is not clear at the present moment if there are different solutions of the belief 

equations that coincide in all the points where the probability is or 1: in other 

words we would like to know if SR{a) = S(o), where S{a) counts the total number 

of solutions of the belief equations. 



that the probabihty of finding a legal configuration goes to zero exponentially 
with N. We stress that the entropy density remains finite at ac, the conjecture 
that Uii{a) vanishes at ac implies that the reduced complexity is captures the 
number of essentially different regions of legal configuration. A priori a finite 
value SR^ac) cannot be excluded. 

3 Methods 

We now show how to obtain the above results on the solutions of the belief 
propagation equations (and of the survey propagation equations) for a large 
random system in the limit of large N. These equations are interesting especially 
in the infinite N limit where the factor graph does not contain short loop. For 
finite N in the random case, (or in the infinite N limit for a non-random case) 
the belief equations may have solutions, but the properties of these solutions do 
not represent exactly the properties of the systems. If the number of short loops 
is small, perturbative techniques may be used to compute the corrections to the 
belief equations. If short loops are very common (e.g. if the literals are on the 
sites of an f.c.c. lattice and the clauses are on faces of the same lattice), it is 
rather likely that the beliefs equations are useless and they could only used as 
starting point of much more sophisticated approaches. 

We attack this problem by studying the solution of the belief propagation 
equations (and of the survey propagation equations) on an random infinite tree. 
Sometimes the solution is unique, i.e. it does not depends on the boundary con- 
ditions at infinity, and sometimes it is not not unique. The statistical properties 
of the solution of these equations can be studied with probabilistic method. One 
arrives to integral equations that can be solved numerically using the method 
of population dynamics |1I2I14I1,^ . The numerical solutions of these integral 
equations can be used to compute ad, at, ac, and au- 

The generalization of the Aldous construction [22123) of an infinite rooted tree 
associated to a graph can play the role of a bridge between a finite instance of the 
problems and the infinite random tree where analytic computations jll2ll4ll,^ 
are done. For example it could be used to prove the existence and the uniqueness 
of the beliefs and survey propagation equation in the appropriate intervals. 

In this way one can argue that the properties on an infinite random tree are 
relevant for the behaviour of a given random system in the limit of large TV. 

We can check that the results we have obtained in these way for the solution 
of the belief and survey propagation equations are correct by computing in an 
explicit way the solution (when it is unique) of the equations for a given sample 
for large N (e.g N = 10"* — 10^). For example we may compare the distribution 
of the beliefs or of the surveys in a large system with the one obtained by solving 
the integral equations for the probabilities: the agreement is usually very good. 

In the same spirit the validity of the result for ad may be checked by study- 
ing the convergence of the iterative procedure for finding the solution of the 
belief-propagation equations on a given large problem One finds that just at ad 
the iterative procedure for finding a solution does not converge anymore and 



this is likely a sign of the existence of many solutions to the belief-propagation 
equations. In a similar way we can check the correctness of au- 

4 Survey decimation algorithm 

The survey decimation algorithm has been proposed |1I2I25I26I27| for finding 
the solution of the random K-satisfiability problem |4l5l6j . 

We start by solving the survey propagation equation. If a survey (s(z)) is 
very near to (1, 0, 0) (or to (0, 0, 1)) in most of the legal solutions of the beliefs 
equations (and consequently in the legal configurations) the corresponding local 
variables will be true (or false). 

The main step in the decimation procedure consists is starting from a problem 
with iV variables and to consider a problem with — 1 variables where s{i) is 
fixed to be true (or false). We denote 

A(i) - S^-^ . (19) 

If A{i) is small, the second problem it is easier to solve: it has nearly the same 
number of solutions of the belief equations and one variable less. (We assume that 
the complexity can be computed by solving the survey propagation equations). 

The decimation algorithm proceeds as follows. We reduces by one the number 
of variables choosing the node i in the appropriate way, e.g. by choosing the node 
with minimal A{i). We recompute the solutions of the survey equations and we 
reduce again the number of variables. At the end of the day two things may 
happen: 

1. We arrive to a negative complexity (in this case the reduced problem should 
have no solutions and we are lost), 

2. The denominator in equation H14I) becomes zero, signaling the presence of a 
contradiction (and also in this case we are lost), 

3. The non-trivial solution of the survey equation disappears. If this happens 
the reduced problem is now easy to be solved. 

The quantity A{i) may be estimated analytically so that it is possible to 
choose the variable with minimal A{i). A careful analysis of the results for large, 
but finite N shows that the algorithm works in the limit of infinite N up to 
aA ~ 4.252, that is definite less, but very near to ac- 

Unfortunately at the present moment this result for ua can be obtained only 
analyzing how the argument works on a finite sample and we are unable to 
write down integral equations for the probability distributions of the solution of 
the survey propagation equations. This drawback leads to the impossibility of 
computing analytically a a- it is a rather difficult task to understand in details 
why for a a it so near to ac- It is interesting to note that for a < a a the survey 
decimation algorithm takes a time that is polynomial in N and using a smart 
implementation the time is nearly linear in N. It is important to stress that 
survey algorithm is an incomplete search procedure which may not be able to 



find any solution to a satisfiable instance. This actually happens with a non- 
negligible probability e.g. for sizes of the order of a few thousands of variables 
also when a < a a, however in the limit iV — > cxd, it should work as soon a < aA- 

In the interval aA < a < ac the decimation procedure leads to a regime of 
negative complexity so that the algorithm does not work. Unfortunately there 
is no analytic computation of a a- It is likely that the fact that ajj — 4.36 is not 
far from ckc is related to the fact the ac — aA iss small. 

It would be very interesting to understand better if such a relation is true 
and to put in a quantitative form. In this regard it would be be important to 
study the K dependence of ac — a a and ajj — ac- This analysis may give some 
hints why ac — aA is so small in 3-SAT. 



5 Conclusions 

Similar problems have been studied by physicists in the case of infinite range 
spin glasses |7]: here the problem consists in finding the minimum Ej of the 
quantity: 

Hj[t] = J2 J^.kr^Tk (20) 

where the minimum is done respect to the variables Ti — ±1 and the J are 
independent Gaussian variables with zero average and variance 

Physical 

intuition tells us that in the limit iV goes to infinity the intensive quantity 

should be (with probability 1) independent from N and it will be denoted by 
eoo- In 1979 it was argued using the so called replica method (that will be 
not discussed in this note) that Coo was equal to the maximum of a certain 
functional F[q], where q{x) is a function defined in the interval [0 — 1]. Later on, 
1985 the same results were rederived using heuristical probabilistic consideration, 
similar to those presented here (but much more complex). In this note we have 
introduced a hierarchical construction, where three levels (configurations, beliefs, 
surveys) are presents: in the case of spin glasses an infinite number of levels 
is needed (in the spin glass case the survey equations do not have an unique 
solution and we have to consider higher and higher levels of abstraction). Only 
very recently Talagrand [2HIj heavily using Guerra's ideas and results, was able 
to prove that the value for Coo, computed 24 year before, was correct. 

The possibility of using these techniques for deriving eventually exact results 
on the K-SAT problem is a very recent one: only a few year ago jl4ll5| the 
previous techniques has been extended to more complex spin glass models where 
the matrix J is sparse (it has an average number of elements per raw that does 
not increase with N). 

The field is very young and rigorous proofs of many steps of the construction 
(also those that would likely be relatively simple) are lacking. We only know. 



as a consequence of general theorems, that this methods give an upper bound 
to the value of ac, and this upper bound should be computed by maximizing 
an appropriate functional of the probability distribution of the surveys. This 
upper bound is rigorous one ^7]: it essentially use positivity arguments (the 
average of a non-negative function is non-negative) in a very smart way and it 
does not depend on the existence or uniqueness of the solutions of the equations 
for the probability distribution of the survey. On the contrary the way we have 
followed to compute this upper bound (i.e. a*) require some extra work before 
becoming fully rigorous. I stress that this upper bounds and the Talagrand's 
exact result do not need in any way considerations on the solutions of the survey 
propagation equations (or of their generalization) on a finite sample. The survey 
propagation equations are crucial for giving an intuitive image of the situation 
(i.e. at a metaphoric level) and for constructing the survey decimation algorithm. 
The heuristic derivation could have been done using the replica method, where 
survey propagation equations are never mentioned, but the argument is much 
more difficult to follow and to transform in a rigorous one. 

The other results come from empirical (sometimes very strong) numerical 
evidence and from heuristic arguments. For example at my knowledge there 
is no proof that the integral equations for the probability of the surveys (or of 
the beliefs) have an unique solution and that the population dynamics algorithm 
converges (to that unique solution). Proofs in this direction would be very useful 
and are a necessary step to arrive to a rigorous quantitative upper bounds and 
eventually exact results. On the other hand the proof of existence of an unique 
solutions (or quasi-solutions) of the surveys (or beliefs) propagation equations 
in the large N limit is lacking for any value of a, although the analysis with 
the population dynamics (whose precise mathematical properties have to be 
clarified) tell us which should the maximum values (au and at respectively, 
below which these uniqueness properties hold. Many steps have to be done, but 
a rigorous determination of ac seems to be a feasible task in a not too far future. 
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